MonoGraspNet

About

This is the project website for the MonoGraspNet, the first deep learning pipeline for 6-DoF grasping from a single RGB image. The paper can be downloaded from here. The dataset and related tools in this work will be released to the community.

Schematic overview over MonoGraspNet. Given a monocular image, Keypoint-Network and Normal-Network predict a keypoint heatmap and a normal map. After 2D keypoint selection and local region cropping, the DWA-Network regresses the rest grasping instructions. Exemplary for one keypoint, given the detected keypoint location, we crop the same regions in RGB image and normal map using an adjustable radius r to obtain a 2 x (2r+1) x (2r+1) x 3 feature map, which we then reshape to (2r+1) x (2r+1) x 6. Besides, we employ RoI Align to aggregate the features and bring them to a size of 112 x 112 x 6, the same as for the other crops. The DWA-Network utilizes three branches for regressing depth, width, and angle associated with the estimated keypoint. Finally, the visible grasping point (3D Keypoint1) and invisible grasping point (3D Keypoint2) can be derived.

If you feel that this work has helped your research a bit, please kindly consider citing it:

@inproceedings{zhai2022monograspnet,

  title={MonoGraspNet: 6-DoF Grasping with a Single RGB Image},

  author={Zhai, Guangyao and Huang, Dianye and Wu, Shun-Cheng and Jung, HyunJun and Di, Yan and Manhardt, Fabian and Tombari, Federico and Navab, Nassir and Busam, Benjamin},

  booktitle={IEEE International Conference on Robotics and Automation},

  year={2023},

  organization={IEEE}

}